Database Clustering and Data Warehousing

نویسندگان

  • Mei-Ling Shyu
  • Shu-Ching Chen
چکیده

Due to the complexity of real-world applications, the number of databases and the volume of data have increased tremendously. Discovering qualitative and quantitative patterns from databases in such a distributed information-providingenvironment has been recognized as a challenging task. In response to such a demand, data mining and data warehousing techniques are emerging to extract the previously unknown and potentially useful knowledge to provide better decision support. This paper presents a mechanism called Markov Model Mediators (MMMs) to facilitate the understanding of the data warehouse schemas/views and the improvement of the query processing performance by analyzing and discovering the summarized knowledge at the database level. Simulation results show that the data mining process leads to a better federation of data warehouses and reduces the cost of query processing. To illustrate these beneets, our approach has been implemented and a simple example and several experiments on real databases are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transbase: a Leading-edge ROLAP Engine Supporting Multidimensional Indexing and Hierarchy Clustering

Analysis-oriented database applications, such as data warehousing or customer relationship management, play a crucial role in the database area. In general, the multidimensional data model is used in these applications, realized as star or snow-flake schemata in the relational world. The so-called star queries are the prevalent type of queries on such schemata. All database vendors have extende...

متن کامل

Rough Set Theory and Fuzzy Logic Based Warehousing of Heterogeneous Clinical Databases

Large amounts of data about the patients with their medical conditions are presented in the Medical databases. Analyzing all these databases is one of the difficult tasks in the medical environment. In order to warehouse all these databases and to analyze the patient‟s condition, we need an efficient data mining technique. In this paper, an efficient data mining technique for warehousing clinic...

متن کامل

Efficient Bulk Deletes for Multi Dimensionally Clustered Tables in DB2

In data warehousing applications, the ability to efficiently delete large chunks of data from a table is very important. This feature is also known as Rollout or Bulk Deletes. Rollout is generally carried out periodically and is often done on more than one dimension or attribute. The ability to efficiently handle the updates of RID indexes while doing Rollouts is a well known problem for databa...

متن کامل

Conceptual Clustering of Heterogeneous Distributed Databases

With increasingly more databases becoming available on the Internet, there is a growing opportunity to globalise knowledge discovery and learn general patterns, rather than restricting learning to specific databases from which the rules may not be generalisable. Clustering of distributed databases facilitates learning of new concepts that characterise common features of, and differences between...

متن کامل

Incremental Clustering for Mining in a Data Warehousing Environment

Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some data mining algorithm have to be updated as well. Due to the very large size of the databases, i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998